Preserve legacy Xarray merge settings #1016

tomvothecoder · 2025-11-11T22:00:50Z

Description

This PR preserves legacy Xarray merge behavior in response to breaking changes introduced in newer versions of Xarray. The changes ensure backward compatibility by explicitly setting compat="no_conflicts" and join="outer" when merging datasets, which were the default behaviors before Xarray v2025.08.0.

Key Changes:

Introduced a global constant LEGACY_XARRAY_MERGE_KWARGS to centralize merge configuration
Applied legacy merge settings to xr.merge() calls in dataset utilities
Refactored duplicate code in mp_partition_driver.py by extracting a reusable _open_mfdataset() function
Applied legacy merge settings to xr.open_mfdataset() in the new _open_mfdataset() helper function

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

New and existing unit tests pass with my changes (locally and CI/CD build)
I have added tests that prove my fix is effective or that my feature works
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

Copilot

Pull Request Overview

This PR preserves legacy Xarray merge behavior in response to breaking changes introduced in newer versions of Xarray. The changes ensure backward compatibility by explicitly setting compat="no_conflicts" and join="outer" when merging datasets, which were the default behaviors before Xarray v2025.08.0.

Key Changes:

Introduced a global constant LEGACY_XARRAY_MERGE_KWARGS to centralize merge configuration
Applied legacy merge settings to xr.merge() calls in dataset utilities
Applied legacy merge settings to xr.open_mfdataset() in the new _open_mfdataset() helper function
Refactored duplicate code in mp_partition_driver.py by extracting a reusable _open_mfdataset() function

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`e3sm_diags/__init__.py`	Defines `LEGACY_XARRAY_MERGE_KWARGS` constant with legacy Xarray merge settings
`e3sm_diags/driver/utils/dataset_xr.py`	Applies legacy merge kwargs to two `xr.merge()` calls
`e3sm_diags/driver/tropical_subseasonal_driver.py`	Adds `join="outer"` to existing `xr.merge()` call already using `compat="override"`
`e3sm_diags/driver/mp_partition_driver.py`	Extracts duplicated code into `_open_mfdataset()` helper and applies legacy merge settings; improves docstring formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

e3sm_diags/driver/mp_partition_driver.py

tomvothecoder

Hey @mahf708 and @chengzhuzhang, this PR is ready for review. It sets legacy Xarray settings on remaining xr.open_mfdataset() and xr.merge() calls to prevent a FutureWarning about changing kwargs.

@mahf708 Can you try pip installing the build from this branch in your env and see if the FutureWarnings are now gone?

tomvothecoder · 2025-11-11T22:02:12Z

e3sm_diags/driver/mp_partition_driver.py

    try:
-        # xr.open_mfdataset() can accept an explicit list of files.
-        landfrac = xr.open_mfdataset(glob.glob(f"{test_data_path}/LANDFRAC_*")).sel(
-            lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
-        )["LANDFRAC"]
-        temp = xr.open_mfdataset(glob.glob(f"{test_data_path}/T_*.nc")).sel(
-            lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
-        )["T"]
-        cice = xr.open_mfdataset(glob.glob(f"{test_data_path}/CLDICE_*.nc")).sel(
-            lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
-        )["CLDICE"]
-        cliq = xr.open_mfdataset(glob.glob(f"{test_data_path}/CLDLIQ_*.nc")).sel(
-            lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
-        )["CLDLIQ"]
+        landfrac = _open_mfdataset(test_data_path, "LANDFRAC", start_year, end_year)
+        temp = _open_mfdataset(test_data_path, "T", start_year, end_year)
+        cice = _open_mfdataset(test_data_path, "CLDICE", start_year, end_year)
+        cliq = _open_mfdataset(test_data_path, "CLDLIQ", start_year, end_year)
    except OSError:
        logger.info(
            f"No files to open for variables within {start_year} and {end_year} from {test_data_path}."
        )
        raise

    parameter.test_name_yrs = test_data.get_name_yrs_attr(season)

    metrics_dict["test"] = {}
    metrics_dict["test"]["T"], metrics_dict["test"]["LCF"] = compute_lcf(
        cice, cliq, temp, landfrac
    )

    if run_type == "model-vs-model":
        ref_data = Dataset(parameter, data_type="ref")
-
        ref_data_path = parameter.reference_data_path
-        start_year = parameter.ref_start_yr
-        end_year = parameter.ref_end_yr
-        # xr.open_mfdataset() can accept an explicit list of files.
+
        try:
-            landfrac = xr.open_mfdataset(glob.glob(f"{ref_data_path}/LANDFRAC_*")).sel(
-                lat=slice(-70, -30),
-                time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
-            )["LANDFRAC"]
-            temp = xr.open_mfdataset(glob.glob(f"{ref_data_path}/T_*.nc")).sel(
-                lat=slice(-70, -30),
-                time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
-            )["T"]
-            cice = xr.open_mfdataset(glob.glob(f"{ref_data_path}/CLDICE_*.nc")).sel(
-                lat=slice(-70, -30),
-                time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
-            )["CLDICE"]
-            cliq = xr.open_mfdataset(glob.glob(f"{ref_data_path}/CLDLIQ_*.nc")).sel(
-                lat=slice(-70, -30),
-                time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
-            )["CLDLIQ"]


I extracted common code into an _open_mfdataset() function at the bottom of this file to make the code DRY.

tomvothecoder · 2025-11-11T22:02:20Z

e3sm_diags/driver/mp_partition_driver.py

+def _open_mfdataset(
+    data_path: str, var: str, start_year: int, end_year: int
+) -> xr.DataArray:
+    """
+    Open multiple NetCDF files as a single xarray Dataset and subset by time
+    and latitude.
+
+    This function reads multiple NetCDF files matching the specified variable
+    name and combines them into a single xarray Dataset. The data is then
+    subsetted based on the specified time range and latitude bounds.
+
+    Parameters
+    ----------
+    data_path : str
+        The path to the directory containing the NetCDF files.
+    var : str
+        The variable name to match in the file pattern.
+    start_year : int
+        The starting year for the time subsetting.
+    end_year : int
+        The ending year for the time subsetting.
+
+    Returns
+    -------
+    xr.DataArray
+        The subsetted DataArray for the specified variable, filtered by time
+        and latitude.
+    """
+    file_pattern = f"{data_path}/{var}_*.nc"
+    ds = xr.open_mfdataset(
+        glob.glob(file_pattern),
+        data_vars="minimal",
+        **LEGACY_XARRAY_MERGE_KWARGS,  # type: ignore[ arg-type ]
+    )
+
+    ds_sub = ds.sel(
+        lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
+    )
+
+    da_var = ds_sub[var]
+
+    return da_var


The DRY function.

tomvothecoder · 2025-11-11T22:02:43Z

e3sm_diags/__init__.py

+# Settings to preserve legacy Xarray behavior when merging datasets.
+# See https://xarray.pydata.org/en/stable/user-guide/io.html#combining-multiple-files
+# and https://xarray.pydata.org/en/stable/whats-new.html#id14
+LEGACY_XARRAY_MERGE_KWARGS = {
+    # "override", "exact" are the new defaults as of Xarray v2025.08.0
+    "compat": "no_conflicts",
+    "join": "outer",
+}


The constant variable for legacy xarray merge settings.

chengzhuzhang · 2025-11-12T21:14:53Z

@tomvothecoder The logs got messy after this xarray update. Thanks for working on this PR to fix this. The code change looks good to me.

tomvothecoder marked this pull request as ready for review November 11, 2025 22:00

Preserve legacy Xarray merge settings for compat and join

33dc8b3

tomvothecoder force-pushed the devops/1012-xr-settings branch from 09da560 to 33dc8b3 Compare November 11, 2025 22:01

tomvothecoder requested a review from Copilot November 11, 2025 22:01

Copilot started reviewing on behalf of tomvothecoder November 11, 2025 22:01 View session

Copilot finished reviewing on behalf of tomvothecoder November 11, 2025 22:02

Copilot AI reviewed Nov 11, 2025

View reviewed changes

e3sm_diags/driver/mp_partition_driver.py Show resolved Hide resolved

tomvothecoder commented Nov 11, 2025

View reviewed changes

tomvothecoder self-assigned this Nov 11, 2025

tomvothecoder requested a review from chengzhuzhang November 11, 2025 22:08

tomvothecoder assigned mahf708 Nov 11, 2025

Restore ref start and end years

49a8614

tomvothecoder merged commit e65bf2a into main Nov 12, 2025
5 checks passed

tomvothecoder deleted the devops/1012-xr-settings branch November 12, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preserve legacy Xarray merge settings #1016

Preserve legacy Xarray merge settings #1016

tomvothecoder commented Nov 11, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

tomvothecoder left a comment

Uh oh!

tomvothecoder Nov 11, 2025

Uh oh!

tomvothecoder Nov 11, 2025

Uh oh!

tomvothecoder Nov 11, 2025

Uh oh!

chengzhuzhang commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Preserve legacy Xarray merge settings #1016

Preserve legacy Xarray merge settings #1016

Conversation

tomvothecoder commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes:

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Uh oh!

tomvothecoder left a comment

Choose a reason for hiding this comment

Uh oh!

tomvothecoder Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

tomvothecoder Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

tomvothecoder Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

chengzhuzhang commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tomvothecoder commented Nov 11, 2025 •

edited

Loading